Dataset for identification of queerphobia

نویسندگان

چکیده

While social media platforms have implemented many algorithmic approaches to moderating hate speech, there is a lack of datasets on queerphobia which has impeded efforts automatically recognize and moderate queerphobic speech online. Queerphobic that intended degrade, insult, or incite violence prejudicial action against queer people, who are those from sexuality, gender, romantic minority. This results in worsened mental emotional outcomes for people can contribute anti-queer violence. The goal this study create dataset YouTube comments further identify speech. To construct dataset, 10,000 were sourced videos represent queerness. Then, volunteers manually annotated each comment accordance with specific guidelines. Various natural language processing (NLP) models used extract features the text, several classifiers these categorize as non-queerphobic. These NLP illustrate baseline performance data. In making we hope research recognition digital make safer people. be found at https://github.com/ShivumB/dataset-for-identification-of-queerphobia.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross Dataset Person Re-identification

Until now, most existing researches on person re-identification aim at improving the recognition rate on single dataset setting. The training data and testing data of these methods are form the same source. Although they have obtained high recognition rate in experiments, they usually perform poorly in practical applications. In this paper, we focus on the cross dataset person re-identification...

متن کامل

Speaker Identification with VoxCeleb DataSet

In this project, we perform a text independent speaker identification experiment with a newly released data set, VoxCeleb (2017)[1], which consists of celebrity interview audio clips downloaded from Youtube. It’s a challenging data set in the sense that there are often multiple vocal sources in the same clip. A MFCC feature vector based Deep Neural Network (DNN) is used as our baseline. It is c...

متن کامل

Arabian Horse Identification Benchmark Dataset

The lack of a standard muzzle print database is a challenge for conducting researches in Arabian horse identification systems. Therefore, collecting a muzzle print images database is a crucial decision. The dataset presented in this paper is an option for the studies that need a dataset for testing and comparing the algorithms under development for Arabian horse identification. Our collected da...

متن کامل

VoxCeleb: A Large-Scale Speaker Identification Dataset

Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and are usually hand-annotated, hence limited in size. The goal of this paper is to generate a large scale text-independent speaker identification dataset collected ‘in the wild’. We make two contributions. First, we propose a fully automated pipeline based on computer vision technique...

متن کامل

VISION: a video and image dataset for source identification

Forensic research community keeps proposing new techniques to analyze digital images and videos. However, the performance of proposed tools are usually tested on data that are far from reality in terms of resolution, source device, and processing history. Remarkably, in the latest years, portable devices became the preferred means to capture images and videos, and contents are commonly shared t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Student Research

سال: 2023

ISSN: ['2167-1907']

DOI: https://doi.org/10.47611/jsrhs.v12i1.4405